Comparing Performance of Different Search Engines through Experiments
نویسنده
چکیده
This chapter reports the results of a project attempting to assess the performance of a few major search engines from various perspectives. The search engines involved in the study include the Microsoft Search Engine (MSE) when it was in its beta test stage, AllTheWeb, and Yahoo. In a few comparisons, other search engines such as Google, Vivisimo are also included. The study collects statistics such as the average user response time, average process time for a query reported by MSE, as well as the number of pages relevant to a query reported by all search engines involved. The project also studies the quality of search results generated by MSE and other search engines using RankPower as the metric. We found MSE performs well in speed and diversity of the query results, while weaker in other statistics, compared to some other leading search engines. The contribution of this chapter is to review the performance evaluation techniques for search engines and use different measures to assess and compare the quality of different search engines, especially MSE. SearchEnginePerformance 3 Comparing Performance of Different Search Engines through Experiments Introduction Search engines, since their inception in the early to mid-1990, have gone through many stages of development. Early search engines were derived from the work of two different, but related fronts. One is to retrieve, organize, and make searchable the widely available, loosely formatted HTML documents over the Web. The other is then-existing information access tools such as Archie (Emtage, 1992), Gopher (Anklesaria et.al. 1993), and WAIS (Kahle, 1991) (Wide Area Information Servers). Archie collects information about numerous FTP sites and provides a searchable interface so users can easily retrieve files through different FTP sites. Gopher provides search tools to large number of Gopher servers on the Internet. WAIS has similar functionality to that of Archie, except that it concentrated on wide variety of information on the Internet, not just FTP sites. With the fast development of Web, search engines designed just for the Web started to emerge. Some of the examples include WWWW (World Wide Web Worm), then-most-powerful search engine AltaVista, NorthernLight, WebCrawler, Excite, InforSeek, HotBot, AskJeeves, AlltheWeb, MSNSearch, and of course, Google. Some of these search engines disappeared in history; others were retooled, re-designed, or simply merged; yet others have been able to stay at the front against all the competition. Google since its inception in 1998 has been the most popular search engine mostly because of its early success in its core algorithm for search, the PageRank algorithm (Brin & Page, 1998). Search engines today are generally capable of searching not only free text, but also structured information such as databases, as well as multi-media such as audio and video. Some of the representative work can be found in (Datta et.al., 2008) and (Kherfi et.al., 2004), More recently some academic search engines start to focus on indexing deeper web and producing knowledge based on the information available on the web, e.g., the KnowItAll project by Etzioni and his team, see for example (Banko et.al,. 2007). In a relatively short history, many aspects of search engines including software, hardware, management, investment and others have been researched and advanced. Microsoft, though a later comer in the Web search business, tried very hard to compete with Google and other leading search engines. As a result, Microsoft unveiled its own search engine on November 11th, 2004 with its Web site at http://beta.search.msn.com (Sherman, 2004). We refer to it as MSN in this discussion. The beta version of the search has since evolved to what is now called Live search engine (http://www.live.com). This chapter reports the results of a project attempting to assess the performance of the Microsoft search engine when it was in its beta version from various perspectives. Specifically the study collects statistics such as the average user response time, average process time for a query reported by MSE itself, the number of pages relevant to a query, the quality of the search in terms of RankPower, and comparisons with its competitors. The rest of the chapter is organized as follows. Section 2 provides an overview of search engine performance metrics. The goals and the metrics of this study are described in Section 3. Section 4 discusses the method of study and the experimental settings, followed by the results and their analysis in Section 5. Our thoughts and conclusions about the study are presented in Section 6. Performance Metrics for Web Search Engines While user perception is important in measuring the retrieval performance of search engines, quantitative analyses provide more “scientific evidence” that a particular search engine is “better” than the other. Traditional measures of recall and precision (Baeza-Yates 1999) work well for laboratory studies of information retrieval systems. However, they do not capture the performance essence of today’s web information systems for three basic reasons. One reason for this problem lies in the importance of the rank of retrieved documents in web search systems. A user of web search engines would not go through the list of hundreds and thousands of results. A user typically goes through a few pages of a few tens of results. The recall and precision measures do not SearchEnginePerformance 4 explicitly present the ranks of retrieved documents. A relevant document could be listed as the first or the last in the collection. They mean the same as far as recall and precision are concerned at a given recall value. The second reason that recall and precision measures do not work well is that web search systems cannot practically identify and retrieve all the documents that are relevant to a search query in the whole collection of documents. This is required by the recall/precision measure. The third reason is that these recall/precision measures are a pair of numbers. It is not easy to read and interpret quickly what the measure means for ordinary users. Researchers (see a summary in (Korfhage 1997)) have proposed many single-value measures such as estimated search length ESL (Cooper 1968), averaged search length ASL (Losee 1998), F harmonic mean, E-measure and others to tackle the third problem. Meng (2006) compares through a set of real-life web search data the effectiveness of various single-value measures. The use and the results of ASL, ESL, average precision, F-measure, E-measure, and the RankPower, applied against a set of web search results. The experiment data was collected by sending 72 randomly chosen queries to AltaVista (AltaVista, 2005) and MARS (Chen & Meng 2002, Meng & Chen 2005). The classic measures of user-oriented performance of an IR system are precision and recall which can be traced back to the time frame of 1960's (Cleverdon et.al. 1966, Treu 1967). Assume a collection of N documents, of which Nr are relevant to the search query. When a query is issued, the IR system returns a list of L results where L <= N, of which Lr are relevant to the query. Precision P and recall R are defined as follows. L L P r = and r r N L R = Note that 0 <= P <= 1 and 0 <= R <= 1. Essentially the precision measures the portion of the retrieved results that are relevant to the query and recall measures the percentage of relevant results are retrieved out of the total number of relevant results in the document set. A typical way of measuring precision and recall is to compute the precision at each recall level. A common method is to set the recall level to be of 10 intervals with 11 points ranging from 0.0 to 1.0. The precision is calculated for each of the recall level. The goal is to have a high precision rate, as well as a high recall rate. Several other measures are related to the measure of precision and recall. Average precision and recall (Korfhage 1997) computes the average of recall and precision over a set of queries. The average precision at seen relevant documents (Baeza-Yates 1999) takes the average of precision values after each new relevant document is observed. The R-precision (Baeza-Yates 1999) measure assumes the knowledge of total number of relevant documents R in the document collection. It computes the precision at R-th retrieved documents. The E measure P R E 1 1 1 2 2
منابع مشابه
ارزیابی خودکار جویشگرهای ویدئویی حوزه وب فارسی بر اساس تجمیع آرا
Today, the growth of the internet and its high influence in individuals’ life have caused many users to solve their daily needs by search engines and hence, the search engines need to be modified and continuously improved. Therefore, evaluating search engines to determine their performance is of paramount importance. In Iran, as well as other countries, extensive researches are being performed ...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملEvaluation of Web Search Engines with Thai Queries
This paper discusses some challenging issues that are found in the evaluation of web search engines by using Thai queries. The discussions are based on our experience in evaluating and comparing the search performance of 7 search engines on Thai queries. The issues addressed in this paper will help in improving further evaluations of search engines for Thai.
متن کاملAn assessment of the visibility of MeSH-indexed medical web catalogs through search engines
Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs th...
متن کامل